Roles of Macro - Actions in Accelerating Reinforcement

نویسندگان

  • Amy McGovern
  • Richard S. Sutton
  • Andrew H. Fagg
چکیده

We analyze the use of built-in policies, or macro-actions, as a form of domain knowledge that can improve the speed and scaling of reinforcement learning algorithms. Such macro-actions are often used in robotics, and macro-operators are also well-known as an aid to state-space search in AI systems. The macro-actions we consider are closed-loop policies with termination conditions. The macro-actions can be chosen at the same level as primitive actions. Macro-actions commit the learning agent to act in a particular, purposeful way for a sustained period of time. Overall, macro-actions may either accelerate or retard learning , depending on the appropriateness of the macro-actions to the particular task. We analyze their eeect in a simple example, breaking the acceleration eeect into two parts: 1) the eeect of the macro-action in changing exploratory behavior, independent of learning , and 2) the eeect of the macro-action on learning, independent of its eeect on behavior. In our example, both eeects are signiicant, but the latter appears to be larger. Finally, we provide a more complex gridworld illustration of how appropriately chosen macro-actions can accelerate overall learning. Many problems in artiicial intelligence (AI) are too large to be solved practically by searching the state-space using available primitive operators. By searching for the goal using only primitive operators, the AI system is bounded by both the depth and the breadth of the search tree. One way to overcome this diiculty is through macro-actions (or macros). By chunking together primitive actions into macro-actions, the eeective length of the solution is shortened. Both Korf, 1985] and Iba, 1989] have demonstrated that using macro-actions to search for a solution has resulted in solutions in cases where the system was unable to nd answers by searching in primitive state-space, and in nding faster solutions in cases where both systems could solve the problem. Reinforcement learning (RL) is a collection of methods for discovering near-optimal solutions to stochas-tic sequential decision problems Watkins, 1989]. An RL system interacts with the environment by executing actions and receiving rewards from the environment. Unlike supervised learning, RL does not rely on an outside teacher to specify the correct action for a given state. Instead, an RL system tries diierent actions and uses the feedback from the environment to determine a closed loop policy which maximizes reward. In this work, we treat macro-actions as closed-loop policies with termination conditions. Prior work that has included closed-loop macro-In …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Description and Acquirement of Macro-Actions in Reinforcement Learning

Reinforcement learning is a framing of enabling agents to learn from interaction with environments. It has focused generally on Markov decision process (MDP) domains, but a domain may be non-Markovian in the real world. In this paper, we develop a new description of macro-actions for non-Markov decision process (NMDP) domains in reinforcement learning. A macro-action is an action control struct...

متن کامل

Macro - Actions in Reinforcement Learning : An EmpiricalAnalysisAmy McGovern and Richard

Several researchers have proposed reinforcement learning methods that obtain advantages in learning by using temporally extended actions, or macro-actions, but none has carefully analyzed what these advantages are. In this paper, we separate and analyze two advantages of using macro-actions in reinforcement learning: the eeect on exploratory behavior, independent of learning, and the eeect on t...

متن کامل

Macro Actions in Reinforcement Learning An Empirical Analysis

Several researchers have proposed reinforcement learning methods that obtain ad vantages in learning by using temporally extended actions or macro actions but none has carefully analyzed what these advantages are In this paper we separate and an alyze two advantages of using macro actions in reinforcement learning the e ect on exploratory behavior independent of learning and the e ect on the sp...

متن کامل

A Pilot Study on the Evolution of Reward Signals for Hierarchical Reinforcement Learning

Recent research has shown that reinforcement learning agents can by greatly advantaged of the possibility of learning to select macro actions instead, or beside, fine primitive actions. The route usually followed to exploit this idea is to build agents with hierarchical architectures that can learn both a repertoire of macro actions and a macro policy that selects them, on the basis of the “fin...

متن کامل

Planning with Closed-Loop Macro Actions

Planning and learning at multiple levels of tempo ral abstraction is a key problem for arti cial intelli gence In this paper we summarize an approach to this problem based on the mathematical framework of Markov decision processes and reinforcement learn ing Conventional model based reinforcement learning uses primitive actions that last one time step and that can be modeled independently of th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997